The Bias-Variance Tradeoff and the Randomized GACV

نویسندگان

  • Grace Wahba
  • Xiwu Lin
  • Fangyu Gao
  • Dong Xiang
  • Ronald Klein
  • Barbara E. Klein
چکیده

We propose a new in-sample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the bias-variance or fit-complexity tradeoff in 'soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class O. The target for optimizing the the tradeoff is the Kullback-Liebler distance between the estimated probability distribution and the 'true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized Forced Quantitative Randomized Response Model

A new generalized forced quantitative randomized response (GFQRR) model for estimating the population total of a sensitive variable is proposed and studied under a unified setup. The bias and variance expressions are derived under unequal probability sampling design. It is shown that the models due to Eichhorn and Hayre (1983), Bar-Lev, Bobovitch, and Boukai (2004), Liu and Chow (1976a, 1976b),...

متن کامل

Bias-variance analysis in estimating true query model for information retrieval

The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stabil...

متن کامل

Bias-Variance Techniques for Monte Carlo Optimization: Cross-validation for the CE Method

In this paper, we examine the CE method in the broad context of Monte Carlo Optimization (MCO) [Ermoliev and Norkin, 1998, Robert and Casella, 2004] and Parametric Learning (PL), a type of machine learning. A well-known overarching principle used to improve the performance of many PL algorithms is the bias-variance tradeoff [Wolpert, 1997]. This tradeoff has been used to improve PL algorithms r...

متن کامل

On Bias Plus Variance

This paper presents a Bayesian additive “correction” to the familiar quadratic loss biasplus-variance formula. It then discusses some other loss-function-specific aspects of supervised learning. It ends by presenting a version of the bias-plus-variance formula appropriate for log loss, and then the Bayesian additive correction to that formula. Both the quadratic loss and log loss correction ter...

متن کامل

Approximate Smoothing Spline Methods for LargeData Sets in the Binary Case

We consider the use of smoothing splines in generalized additive models with binary responses in the large data set situation. Xiang and Wahba (1996) proposed using the Generalized Approximate Cross Validation (GACV) function as a method to choose (multiple) smoothing parameters in the binary data case and demonstrated through simulation that the GACV method compares well to existing iterative ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998